AIBase
Home
AI NEWS
AI Tools
AI Models
MCP
AI Services
AI Compute
AI Tutorial
EN

AI News

View More

DeepSeek and Tsinghua University Joint Research: Innovative Reward Model Inference Method Improves Scalability

Researchers from DeepSeek and Tsinghua University recently published a new paper exploring scaling methods for reward model inference, seemingly advancing DeepSeek R2. Reinforcement learning is widely used in the large-scale post-training phase of large language models, but faces the challenge of obtaining accurate reward signals for these models. The researchers found that using pointwise generative reward modeling (GRM) improves model adaptability and scalability during inference. To this end, they propose Self-Principle Calibration Tuning (SPCT) learning.

9.6k 6 days ago
DeepSeek and Tsinghua University Joint Research: Innovative Reward Model Inference Method Improves Scalability
AIBase
Empowering the future, your artificial intelligence solution think tank
English简体中文繁體中文にほんご
FirendLinks:
AI Newsletters AI ToolsMCP ServersAI NewsAIBaseLLM LeaderboardAI Ranking
© 2025AIBase
Business CooperationSite Map